DOCUMENT RESUME 



ED 459 761 



IR 058 255 



AUTHOR 
TITLE 
PUB DATE 
NOTE 



AVAILABLE FROM 
PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Nichols, Stephen G. 

Artifacts in Digital Collections. 

2001-08-00 

9p.; In: Libraries and Librarians: Making a Difference in 
the Knowledge Age. Council and General Conference: 

Conference Programme and Proceedings (67th, Boston, MA, 
August 16-25, 2001); see IR 058 199. 

For full text: http://www.ifla.org. 

Reports - Descriptive (141) -- Speeches/Meeting Papers (150) 

MF01/PC01 Plus Postage. 

Access to Information; *Electronic Publishing; Evaluation 
Criteria; Information Technology; *Library Materials; 

Library Role; *Library Services; *Preservation 

Artifacts; Digital Data; *Electronic Resources; * Information 

Value 



ABSTRACT 



This paper discusses the preservation of digital resources 
in research libraries. The first section addresses the preservation 
imperative, including the challenges of quantity, stability of media, 
economics, and the contingent value of artifacts. The second section 
describes the artifact in question (i.e., an information resource in which 
the information is recorded on a physical medium, which may or may not be 
unique, and in which the information value of the resource adheres not only 
in the text or content, but also in the object itself) . Topics covered in 
this section include definition of the term "artifact, " selection for the 
preservation of the original format, and mechanisms for determining value. 



(MES) 



Reproductions supplied by EDRS are the best that can be made 
from the original document. 



cIR058255 




67th IFLA Council and General 

Conference 

August 16-25, 2001 

PERMISSION TO REPRODUCE AND 

— ■ — DISSEMINATE THIS MATERIAL HAS 

BEEN GRANTED BY 



Code Number: 

Division Number: 
Professional Group: 

Joint Meeting with: 

Meeting Number: 
Simultaneous Interpretation: 



084-168-E 

VI 

Preservation and Conservation 
Information Technology 
168 



A.L. Van Wesemael 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 

1 



Artifacts in digital collections 
Stephen G. Nichols 

Romance Languages Department, Johns Hopkins University 
Chair; Task force on the Artifact in Library Collections 
Baltimore, USA 

E-mail: stephen.nichols@ihu.edu 



U.S. DEPARTMENT OF EDUCATION 
Office of Educational Research and Improvement 
EDUCATIONAL RESOURCES INFORMATION 
j CENTER (ERIC) 

/JE^This document has been reproduced as 
" received from the person or organization 
originating it. 

□ Minor changes have been made to 
improve reproduction quality, 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OER! position or policy. 



1. The Preservation Imperative 

The “information explosion” sparked by digital technology has fostered an increasing awareness of the 
sheer mass of information available today in a variety of media, from traditional formats like paper to the 
more recent film, optical, and magnetic formats. Today, the world produces two billion gigabytes of new 
information a year, or roughly 250 megabytes for every man, woman, and child on earth (Berkeley 2000). 
Institutions charged with collecting, storing, preserving, and making accessible recorded information have 
struggled to keep pace with the growth of information production, even though their brief is to collect only a 
portion of what is published and an even smaller portion of what is produced and disseminated in 
unpublished form. 

Although information overload is not a new problem, the introduction of digital technology onto campuses 
and into research libraries has fundamentally altered the information landscape, with potentially serious 
ramifications for scholars and students. The creation and dissemination of digital resources are creating new 
models of service and access, such as licensing rather than owning essential intellectual assets. The 
mutability of digital documents is redefining what constitutes a text — are back issues of a journal that are in 
digital form a bunch of articles or a rich database? And the conversion of texts into hypertexts is resulting 
in increased interdisciplinary research, when a researcher in one field serendipitously find resources 
normally confined (in print) to another. 




BEST COPY AVAILABLE 





Accompanying these trends is another series of trends that at first seem paradoxical. There is increased 
scholarly attention to original, unreformatted materials. And there are eruptions of public outcry when 
discoveries of material losses in libraries and archives are made. While scholars demand ever-increasing 
attention to an ever-expanding range of candidates for preservation, library budgets simply do not support 
those demands. Preservation has thus become an “unfunded mandate,” the more pernicious for being often 
implicit. Academic institutions have learned the huge costs of penny-wise facilities management and 
“deferred maintenance.” It is reasonable to fear that we are now incurring similar future costs by deferring 
preservation. With preservation, though, there is a crucial difference: a significant part of what we avoid 
thinking about today will be lost forever by our neglect. Library collections are among the most valued of a 
research institution’s intellectual and cultural assets, assets that form a crucial part of what might be called 
“public goods.” But, as is the case with public goods, many make claim to their use, but few the 
responsibility for their well being. This paper focuses on preservation — what it takes to ensure the present 
and future usability of collections — for without preservation today, there will be no access tomorrow. 

Preservation is a critical part of good stewardship of our intellectual and cultural heritage. Its chief 
challenges at the turn of the twenty-first century are fourfold: 

Quantity. Because of the relentless growth of research libraries and their collections, an immense number 
of research items demand resources to remain accessible. In 1999, the 121 member libraries of the 
Association of Research Libraries (ARL) reported owning a combined total of 462,965,204 volumes. The 
greatest period of growth for research libraries occurred after the First World War, after the Second World 
War, and in the last decade. Looking at 12 representative public university libraries during this period 
reveals a typical growth pattern. 1 In 1907, these libraries held an average of 107,425 volumes; by 1961, 
that number had grown to 1,772,83 1, and by 1995, the average per library was 5,334,620 (Gerould). 

Stability of media. Library collections exist in a variety of physical formats that are vulnerable to one 
degree or another. As the rate of information production has increased, the storage medium has become 
more compact and efficient. However, this miniaturization comes as the expense of stability and longevity. 
With the exception of preservation-quality microfilm, the new media of the twentieth-century are more 
fragile than those of the nineteenth, including the by-now infamous wood-pulp paper that deteriorates into 
flakes over time (Conway 1996, 5). The many media that have been invented in the last 150 years to capture 
light and sound are generally extremely fragile, dependent on machines for playback, and subject to rapid 
technological obsolescence. The wax cylinders on which are inscribed the earliest known recorded voices of 
Native Americans are susceptible to mold, heat, scratching, skin oils, and physical trauma. Moreover, they 
are dependent on playback equipment for which no spare parts are manufactured and for which there are 
few technicians skilled in repair. Yet the information on them is invaluable and must be saved for future 
generations. 

Economics. Since 1993, preservation budgets in ARL libraries have remained flat and, in real dollars, are 
shrinking. At the same time, the number of staff assigned to preservation is at a ten-year low (Scott 1999). 
Yet the demand for access to original materials has grown, especially to those materials in special formats 
that often are at greatest risk from physical handling or environmental stress. Meanwhile, as the technology 
for reformatting for access has greatly improved, the funding for preservation continues to decrease, since 
more money goes to digital reformatting of items in high demand and less to microfilming the low-use brittle 
books that are rotting on shelves. 



1 The libraries are: University of California, Berkeley; University of Illinois; Indiana University; University of Iowa; University of 
Kansas; University of Michigan; University of Minnesota; University of Missouri; University of Nebraska; Ohio State University; 
University of Washington (Seattle); and the University of Wisconsin. 



2 



3 



Contingent Value of artifacts. The most difficult challenge for libraries is deciding how to set priorities for 
preservation. As long as the claim on preservation resources exceeds the available funds, from both internal 
and external sources, there will be a necessary triage of materials that get treatment. An example of this 
trend is the recent elevation of nineteenth-century popular imprints and ephemera to a status of high 
research value. Providing appropriate access to those items, which are at high risk from embrittlement and 
ordinary physical handling, has put an enormous strain on library resources. Knowing that the intellectual 
interests and research methodologies of scholarship will change over time, research institutions have 
collected “just in case” there is a demand in the future, rather than “just in time” for demand. Research 
libraries and archives are full of items that have not been consulted in decades, if ever, and whose future 
demand is unpredictable. How do a library and its home institution make sound fiscal and intellectual 
decisions about what to preserve, when, for whom, and at what price? Despite the enormous collections of 
printed materials that have been amassed, entire categories of primary sources have disappeared before 
collecting institutions and their users understood their value. The most notorious example of neglect is the 
fact that 80 percent of all silent films made in the United States are gone without a trace, and 50 percent of 
films made before the Second World War have also perished and will never be recovered. The loss is so 
great that we do not even know what has been lost. 

2. The Artifact in Question 

“Artifact” is a term that can be confusing because it masks a number of unexamined assumptions. In 
common academic parlance, “artifact” can be used to mean a physical object, a primary record, or a 
physical object that constitutes a primary record. 2 From the point of view of a researcher, however, and for 
the purposes of this report, an artifact can be defined as an information resource in which the information is 
recorded on a physical medium (such as a photograph or a book), which may or may not be unique 
(holograph letter or a paperback book), and in which the information value of the resource adheres not only 
in the text or content, but also in the object itself. In other words, artifacts are things that have intrinsic 
value, independent of the informational content. 

Recently, scholars have identified an increasing number of library items that have intrinsic research value as 
physical objects. The Modem Language Association (MLA) defines an artifact or primary record as “a 
physical object produced or used at the particular past time that one is concerned with in a given instance” 
(MLA 1996). The definition goes on to assert that, for all practical purposes, all historical publications, 
even those produced in mass quantities by mass production techniques designed to minimize deviations from 
a norm, may have some qualities of uniqueness. 3 Since uniqueness is one of the defining features of 
something that has artifactual value, this position would seem valid. The problem for libraries, however, is 
that by definition these items would therefore have a claim not only on the attention of the scholar, but also 
on the budgets of preservation departments. Whether one disagrees or accepts in toto this definition and its 
implications, the difficulty it presents for libraries is that they are not, have not been, and are unlikely ever 
to be funded to meet the need to collect and preserve literally everything of potential research value. For 
libraries, this expansive view of artifactual value presents problems that are not primarily theoretical, but 
eminently practical. 

Given the task of identifying achievable, fundable preservation strategies and goals for libraries, we must 
identify intrinsic artifactual value and to do so in way that, following the spirit of preservation principles, 
would accept some loss as inevitable and sought rather to manage the risks of unacceptable loss. 



2 Of course, in scientific laboratories, "artifact" also denotes a substance that is a by-product of some external action or agent. 

3 The statement addresses only text-based sources. If this standard of value were extended to visual and sound resources, the 
universe of primary records grows exponentially. 




3 



4 



2.1 Selection for Preservation of Original 

The library preservation community has agreed on certain cardinal features of physical objects that warrant 
preservation in their original formats. They are: 

Age 

Evidential value 
Aesthetic value 
Scarcity 

Associational value 
Market value 
Exhibition value 

Objective criteria or established practice determine many of these features, and the criteria vary little among 
libraries. They are, in short, best practice; several written policies are included in Appendix III as 
meaningful examples. 

We are not talking here about those categories of artifacts that are always, as a matter of course, retained in 
the original. That would mean books printed before 1801, which are usually segregated from general 
holdings in a rare book collection and subject to different handling and preservation protocols; manuscripts 
and archival materials that exists only in one copy; items with high market value, and so on. 

The value of the artifact for research purposes — as opposed to monetary value or exhibition value — can be 
thought of as chiefly evidentiary. An artifact is of value to the extent that it testifies to the information being 
original, faithful, fixed, or stable. 

Originality : An original manifestation of a book, photograph, or recorded performance is valuable because 
through it a scholar may come closer to uncovering the original intent of the creator and or publisher, or 
both. 4 When copies yield insufficient information about original intent, then access to the original may be 
needed. Reformatting and copying information is analogous to translation from one language to another. 
Depending on the source and the target language, as well as the skill, care, and cultural biases of the 
translator, something is always lost. A good translation, like good recopying, is one that loses the least 
amount of original content and intent. 

Fidelity : The artifact is also useful and at times essential in establishing the authenticity of an item. In other 
words, it has forensic value. How does one know that the item in one’s hands is what it purports to be? 
There are internal clues in a document that give evidence of authenticity, of course, such as the accuracy 
and appropriateness of the content. A diary that is dated 1901 and contains references to television 
broadcasts, for example, is unlikely to be what it purports to be. But in addition to the intellectual 
information, there is also the external information contained in a physical manifestation that also provides 
clues of authenticity and integrity. Erasure marks on a sheet of paper, splices in a film, white-out on 
property maps — all these are physical clues to the integrity of the object and, hence, the authenticity of the 
information recorded in the object. 



4 The instances of published versions differing from the presumed intent of the creator are legion in books, films, and so forth. In 
those cases, the sources that contain information about the work in pre-publication form (e.g., draft, outtakes, proof sheets) are also 
required to reconstruct original intent. 






O 



4 



Fixity'. The content of the artifact when it was first produced constitutes the text (in the case of textual 
materials) or the document (in the case of a photograph, say, or an opera performance). If one is holding a 
fifteenth-generation fax, one cannot guarantee that the full content of the original is conveyed except by 
comparison with the original, which has fixed the content by recording it at one instant in time. One of the 
wonders of mechanical reproduction of text after Gutenberg was the way that replication by machine en 
masse tended to fix texts that had previously been taken to be somewhat fluid. This is one of the innovations 
of the print regime that appears to be eroding in the digital realm. 

Stability'. The persistence of an object over time leads to the stable and continuous accessibility of the 
information contained in it. Documents whose physical substrate changes over time themselves change. 
Film that gets spliced and repaired loses content; digital files that get reformatted into a newer version of a 
software program change; photographic images printed or displayed in various manner shift tone. When one 
looks at a thirty-year-old image of a woman in a red coat printed on paper that fades, and looks at a 
contemporary image made from the same negative projected through a slide projector, chances are there will 
be two different reds that constitute "the coat.” The content of that red is not stable, and it is difficult to 
mentally efface the effects of age and reformatting and determine whether the coat was originally scarlet red 
or crimson red. 

There are, in addition, artifacts valuable for research because the format is itself the subject of 
investigation. Original bindings carry evidence of print history, just as original daguerreotypes carry 
evidence of an early photographic technology. In these cases the object itself is the primary source, not the 
information it carries. 

Also of value to the research process is the physical encounter between the researcher and the object, an 
encounter that can help to prime the scholars' imaginative and analytical skills. While this factor is highly 
subjective and difficult to quantify, it is commonly cited by scholars as being, at least at some stage of their 
career, of irreplaceable heuristic value. A medievalist who has never worked directly from manuscripts is at 
a disadvantage, just as a biographer of Thomas Jefferson who has worked exclusively from the printed 
editions of his letters may be said to work at one critical remove from the subject. Given the toll that 
physical handling takes on all types of materials, though, scholars and library professionals should accept 
the fact that surrogates can be judiciously used by those who have a familiarity with original source 
materials and, from the perspective of both preservation and convenience of access, are often preferable. 5 

2.2 Mechanisms for Determining Value 

Questions about the nature of the artifact have caused scholars and library professionals to realize that, 
even for the early part of the nineteenth century, much more information of potential research value exists in 
traditional formats, such as paper and image, than had previously been recognized. Consequently, the 
process of redefining what constitutes an artifact must be done not only for new media, but also for a 
considerable body of information from the 1800s. The fragility of paper-based materials, especially 
newspapers, printed since 1850 has been a concern for some time (Marley, Baker, Smith, Cox). Because of 
their fragility, preserving one or more instances of all imprints of newspapers poses enormous technical as 
well as financial challenges to collecting institutions. More recently, there has been a growing awareness of 
other kinds of artifacts from the nineteenth and early twentieth centuries that also require the attention of 



5 See, for example, the case of the editorial team working on the James Boswell Papers at the Beinecke Library 
(Bouch6 1999). The international team of editors came to prefer the use of digital scans of the original manuscripts 
to working from the originals in New Haven. In part this was the matter of convenience — the work could be done 
wherever the editors were located and obviated a series of disruptive trips to New Haven — and in part because the 
scans provided superior legibility. 



O 

ERJC 



5 



8 



